Analyzing Election Contributions in West Virginia by Michael Kuehn

Prior to conducting an analysis using R, the data was reviewed and audited in Python. Please see Python code. I noticed that there were a wide range of occupations that people had inputted while making their contributions. In order to have a more meaningful analysis regarding contributor’s occupations, I classified occupations into general categories (e.g. MEDICAL, EDUCATION, RETIRED, etc.). The original data was read into a Python script and a new csv file was output with occupation categories. Any contributions that were negative numbers were skipped since these are refunds. The focus is on contributions made to judge the excitement and engagment of individual donors during the election cycle.

The first stage of the analysis is to look at a summary of the data. Most of the columns in the data are categorical variables. Tables were created for each of these variables to examine the counts and relative frequencies within the data. The contribution amount is a quantitative varible and will be analyzed using measures of center and spread.

The following columns will be explored in this analysis: ‘cand_nm’, ‘contbr_city’, ‘contbr_employer’, ‘contbr_occupation’, ‘occupation_category’, ‘contb_receipt_amt’, ‘contb_receipt_dt’, and ‘election_tp’

The code block below includes the structure of the data, a summary of the data, and two sets of tables. The first table lists the top 10 (if there are 10 categories) counts for each of the columns. The second table lists the same information but as percentages. A table of some detailed summary statistics for the quantitative variable ‘contb_receipt_amt’ is also included.

## 'data.frame':    9099 obs. of  19 variables:
##  $ cmte_id            : Factor w/ 18 levels "C00430470","C00430512",..: 16 17 16 14 14 14 14 14 14 14 ...
##  $ cand_id            : Factor w/ 17 levels "P00003186","P00003251",..: 7 1 7 17 17 17 17 17 17 17 ...
##  $ cand_nm            : Factor w/ 17 levels "Biden, Joseph R Jr",..: 13 17 13 8 8 8 8 8 8 8 ...
##  $ contbr_nm          : Factor w/ 2880 levels "ABRAHAM, LAUREL",..: 2168 600 2736 2329 2126 2126 2126 2126 2126 2126 ...
##  $ contbr_city        : Factor w/ 347 levels "ALBRIGHT","ALDERSON",..: 214 222 282 19 32 32 32 32 32 32 ...
##  $ contbr_st          : Factor w/ 1 level "WV": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_zip         : int  252650336 259019766 24976 25801 25818 25818 25818 25818 25818 25818 ...
##  $ contbr_employer    : Factor w/ 1265 levels "","(NONE)","3M HEALTH INFORMATION SYSTEMS",..: 40 490 403 1 465 465 465 465 465 465 ...
##  $ contbr_occupation  : Factor w/ 858 levels "","227 CAPITOL ST",..: 258 395 301 698 510 510 510 510 510 510 ...
##  $ occupation_category: Factor w/ 12 levels "EDUCATION","EXECUTIVE",..: 6 6 6 9 1 1 1 1 1 1 ...
##  $ contb_receipt_amt  : num  20 2300 25 50 11 5.18 5 15 1 2.98 ...
##  $ contb_receipt_dt   : Factor w/ 601 levels "1-Apr-07","1-Apr-08",..: 338 407 257 599 55 94 114 132 132 206 ...
##  $ receipt_desc       : Factor w/ 10 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd            : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_text          : Factor w/ 21 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ form_tp            : Factor w/ 2 levels "SA17A","SA18": 1 1 1 1 1 1 1 1 1 1 ...
##  $ file_num           : int  336959 313015 336959 330012 330012 330012 330012 330012 330012 330012 ...
##  $ tran_id            : Factor w/ 9109 levels "10000477","10002226",..: 2761 8306 3006 8683 8693 8695 8696 8697 8698 8699 ...
##  $ election_tp        : Factor w/ 2 levels "G2008","P2008": 2 2 2 2 2 2 2 2 2 2 ...
##       cmte_id          cand_id                        cand_nm    
##  C00431445:5087   P80003338:5087   Obama, Barack          :5087  
##  C00431569:1601   P00003392:1601   Clinton, Hillary Rodham:1601  
##  C00430470:1082   P80002801:1314   McCain, John S         :1314  
##  C00431205: 302   P40002347: 302   Edwards, John          : 302  
##  C00432914: 252   P80000748: 252   Paul, Ron              : 252  
##  C00446104: 232   P80003478: 127   Huckabee, Mike         : 127  
##  (Other)  : 543   (Other)  : 416   (Other)                : 416  
##                    contbr_nm           contbr_city   contbr_st
##  BRADLEY, ROBERT        :  59   CHARLESTON   :1375   WV:9099  
##  STERNS, CAROLYN        :  54   MORGANTOWN   : 935            
##  HURSH, DANIEL          :  49   HUNTINGTON   : 506            
##  JENNINGS, ALAN         :  42   SHEPHERDSTOWN: 364            
##  PARANAC, LEONARD R. MR.:  39   WHEELING     : 262            
##  RINKER, SARAH          :  37   HARPERS FERRY: 253            
##  (Other)                :8819   (Other)      :5404            
##    contbr_zip                     contbr_employer
##  Min.   :        0   NOT EMPLOYED         :1996  
##  1st Qu.:253032865   SELF EMPLOYED        : 898  
##  Median :254431113   RETIRED              : 571  
##  Mean   :230753993                        : 310  
##  3rd Qu.:261478043   INFORMATION REQUESTED: 268  
##  Max.   :268840007   (Other)              :5045  
##                      NA's                 :  11  
##              contbr_occupation occupation_category contb_receipt_amt
##  RETIRED              :2484    OTHER     :2770     Min.   :   1     
##  ATTORNEY             : 603    RETIRED   :2489     1st Qu.:  30     
##  PHYSICIAN            : 371    MEDICAL   :1467     Median : 100     
##  NOT EMPLOYED         : 225    EDUCATION : 795     Mean   : 210     
##  INFORMATION REQUESTED: 203    EXECUTIVE : 783     3rd Qu.: 200     
##  (Other)              :5207    UNEMPLOYED: 239     Max.   :2300     
##  NA's                 :   6    (Other)   : 556                      
##   contb_receipt_dt                                receipt_desc  memo_cd 
##  30-Sep-08: 170                                         :8686    :8565  
##  16-Oct-08: 135    REDESIGNATION FROM PRIMARY           : 207   X: 534  
##  31-Oct-08: 122    REATTRIBUTION/REDESIGNATION REQUESTED:  81           
##  23-Oct-08: 101    REDESIGNATION TO                     :  56           
##  24-Oct-08: 100    REDESIGNATION REQUESTED              :  48           
##  31-Jul-08: 100    REATTRIBUTION REQUESTED              :  13           
##  (Other)  :8371    (Other)                              :   8           
##                                  memo_text     form_tp    
##                                       :8288   SA17A:8577  
##  OVF TRANSFER                         : 298   SA18 : 522  
##  REDESIGNATION FROM PRIMARY           : 207               
##  REATTRIBUTION/REDESIGNATION REQUESTED:  81               
##  REDESIGNATION TO                     :  56               
##  ORIGINAL TRANSACTION                 :  54               
##  (Other)                              : 115               
##     file_num          tran_id     election_tp 
##  Min.   :294891   10000477:   1   G2008:2790  
##  1st Qu.:353643   10002226:   1   P2008:6309  
##  Median :753761   10003971:   1               
##  Mean   :597461   10006491:   1               
##  3rd Qu.:753821   10008089:   1               
##  Max.   :877004   10008114:   1               
##                   (Other) :9093
## $cand_nm
## x
##           Obama, Barack Clinton, Hillary Rodham          McCain, John S 
##                    5087                    1601                    1314 
##           Edwards, John               Paul, Ron          Huckabee, Mike 
##                     302                     252                     127 
##            Romney, Mitt     Giuliani, Rudolph W   Thompson, Fred Dalton 
##                     109                      95                      67 
##  Brownback, Samuel Dale 
##                      46 
## 
## $contbr_city
## x
##    CHARLESTON    MORGANTOWN    HUNTINGTON SHEPHERDSTOWN      WHEELING 
##          1375           935           506           364           262 
## HARPERS FERRY   MARTINSBURG       BECKLEY   PARKERSBURG  CHARLES TOWN 
##           253           251           200           188           165 
## 
## $contbr_employer
## x
##                           NOT EMPLOYED 
##                                   1996 
##                          SELF EMPLOYED 
##                                    898 
##                                RETIRED 
##                                    571 
##                                        
##                                    310 
##                  INFORMATION REQUESTED 
##                                    268 
##               WEST VIRGINIA UNIVERSITY 
##                                    262 
##                          SELF-EMPLOYED 
##                                    240 
## INFORMATION REQUESTED PER BEST EFFORTS 
##                                    107 
##                                   SELF 
##                                     93 
##                                   NONE 
##                                     79 
## 
## $contbr_occupation
## x
##               RETIRED              ATTORNEY             PHYSICIAN 
##                  2484                   603                   371 
##          NOT EMPLOYED INFORMATION REQUESTED             PROFESSOR 
##                   225                   203                   189 
##               TEACHER             HOMEMAKER                       
##                   189                   184                   133 
##                LAWYER 
##                   111 
## 
## $occupation_category
## x
##      OTHER    RETIRED    MEDICAL  EDUCATION  EXECUTIVE UNEMPLOYED 
##       2770       2489       1467        795        783        239 
##  HOMEMAKER      LEGAL    STUDENT  POLITICAL 
##        211        162         57         50 
## 
## $contb_receipt_dt
## x
## 30-Sep-08 16-Oct-08 31-Oct-08 23-Oct-08 24-Oct-08 31-Jul-08 30-Aug-08 
##       170       135       122       101       100       100        98 
## 29-Sep-08 30-Oct-08 31-Aug-08 
##        94        86        85 
## 
## $election_tp
## x
## P2008 G2008  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  6309  2790    NA    NA    NA    NA    NA    NA    NA    NA
## $cand_nm
## x
##           Obama, Barack Clinton, Hillary Rodham          McCain, John S 
##              55.9072426              17.5953401              14.4411474 
##           Edwards, John               Paul, Ron          Huckabee, Mike 
##               3.3190460               2.7695351               1.3957578 
##            Romney, Mitt     Giuliani, Rudolph W   Thompson, Fred Dalton 
##               1.1979338               1.0440708               0.7363447 
##  Brownback, Samuel Dale 
##               0.5055501 
## 
## $contbr_city
## x
##    CHARLESTON    MORGANTOWN    HUNTINGTON SHEPHERDSTOWN      WHEELING 
##     15.111551     10.275854      5.561051      4.000440      2.879437 
## HARPERS FERRY   MARTINSBURG       BECKLEY   PARKERSBURG  CHARLES TOWN 
##      2.780525      2.758545      2.198044      2.066161      1.813386 
## 
## $contbr_employer
## x
##                           NOT EMPLOYED 
##                             21.9364765 
##                          SELF EMPLOYED 
##                              9.8692164 
##                                RETIRED 
##                              6.2754149 
##                                        
##                              3.4069678 
##                  INFORMATION REQUESTED 
##                              2.9453786 
##               WEST VIRGINIA UNIVERSITY 
##                              2.8794373 
##                          SELF-EMPLOYED 
##                              2.6376525 
## INFORMATION REQUESTED PER BEST EFFORTS 
##                              1.1759534 
##                                   SELF 
##                              1.0220903 
##                                   NONE 
##                              0.8682273 
## 
## $contbr_occupation
## x
##               RETIRED              ATTORNEY             PHYSICIAN 
##             27.299703              6.627102              4.077371 
##          NOT EMPLOYED INFORMATION REQUESTED             PROFESSOR 
##              2.472799              2.231014              2.077151 
##               TEACHER             HOMEMAKER                       
##              2.077151              2.022200              1.461699 
##                LAWYER 
##              1.219914 
## 
## $occupation_category
## x
##      OTHER    RETIRED    MEDICAL  EDUCATION  EXECUTIVE UNEMPLOYED 
## 30.4429058 27.3546544 16.1226508  8.7372239  8.6053412  2.6266623 
##  HOMEMAKER      LEGAL    STUDENT  POLITICAL 
##  2.3189361  1.7804154  0.6264425  0.5495109 
## 
## $contb_receipt_dt
## x
## 30-Sep-08 16-Oct-08 31-Oct-08 23-Oct-08 24-Oct-08 31-Jul-08 30-Aug-08 
## 1.8683372 1.4836795 1.3408067 1.1100121 1.0990219 1.0990219 1.0770414 
## 29-Sep-08 30-Oct-08 31-Aug-08 
## 1.0330806 0.9451588 0.9341686 
## 
## $election_tp
## x
##    P2008    G2008     <NA>     <NA>     <NA>     <NA>     <NA>     <NA> 
## 69.33729 30.66271       NA       NA       NA       NA       NA       NA 
##     <NA>     <NA> 
##       NA       NA
##      nbr.val     nbr.null       nbr.na          min          max 
##       9099.0          0.0          0.0          1.0       2300.0 
##        range          sum       median         mean      SE.mean 
##       2299.0    1910651.2        100.0        210.0          4.2 
## CI.mean.0.95          var      std.dev     coef.var 
##          8.2     160423.2        400.5          1.9
##           Obama, Barack Clinton, Hillary Rodham          McCain, John S 
##                  38.025                  23.954                  16.651 
##     Giuliani, Rudolph W           Edwards, John            Romney, Mitt 
##                   5.806                   5.545                   2.124 
##               Paul, Ron   Thompson, Fred Dalton  Brownback, Samuel Dale 
##                   2.119                   2.055                   1.378 
##          Huckabee, Mike        Richardson, Bill      Biden, Joseph R Jr 
##                   0.950                   0.662                   0.304 
##      Kucinich, Dennis J Tancredo, Thomas Gerald          Hunter, Duncan 
##                   0.211                   0.131                   0.047 
##    Gilmore, James S III     Dodd, Christopher J 
##                   0.026                   0.012

There are 9,099 different contributions recorded in this data set for West Virginia for the primary and general election cycles in 2008. 69% of the contributions were made during the primaries, and 31% were made during the general election. The highest activity occurred on September 30th, 2008 when 1.87% of the total contributions were made. The “Other” category of occupation is the most common category with 30.4%. The next highest occupation categories are Retired, Medical, Education, and Executive. Unemployed, Homemaker, Legal, Student, Political, and Religious make up less than 10% of the contributions. Contributions made from Charleston, WV made up 15% of the contributions. Barack Obama received a little over half of all of the contributions made. By actual amount, Obama received 38% of the sum of all contribution amounts.

Univariate Plots Section

I plotted each of the categorical variables of interest using bar charts. I created a histogram and boxplot for the contribution amounts.

## $cand_nm

## 
## $occupation_category

## 
## $election_tp

The above three bar charts show the candidate names, occupation categories, and election type summaries with counts for the numbers of contributions. These plots give a general sense of the relationship of these variables is with relation to number of contributions. It is easy to see that Obama received the highest count of contributions, the Other and Retired occupation categories are the most common, and more contributions were made during the primaries.

The plots corresponding with city, employer, actual occupation reported, and contribution date were created to limit the number of categories on the x axis. For example, the number of employers reported within the data set was 1,265.
It would not be feasible to show all of these in one bar chart. This also justifies the use of creating occupation categories to find more meaningful relationships between occupations and contributions.

The plots above show various summary statistics and how they compare across the various occupation categories, candidates, and cities. I placed a dot corresponding with either the mean or median on the plots to highlight any differences between the two measurements.

The plot showing the mean amounts by candidate was interesting because I noticed that the minimum dot for Gilmore was on the mean amount. I checked to see the contributions in the data for Gilmore, and my suspicion was confirmed – he only received one $500 contribution in April 2007.

##                   cand_nm contb_receipt_amt contb_receipt_dt
## 8331 Gilmore, James S III               500        19-Apr-07

The plots for the cities are not that useful because the number of contributions from the cities displayed is very small (usually only 1 contribution), so that makes the summary statistics not very good descriptors. I decided to instead make the sample plots but use the top cities by the number of contributions made by city. The difference is that one can now see that major cities such as Charleston, Morgantown, Wheeling, etc. are displayed instead of cities with small numbers of contributions.

The final set of univariate plots are basic descriptive plots showing the distribution of the quantitative variable contribution amount. Histograms and boxplots were created (one with a regular scale and one with a log10 scale since the data is so heavily skewed to the right). On the boxpot, it is interesting to note the darker bands of jittered points. These appear to correspond with contribution amounts that are commonly made such as $500 or $1000. The following is a list of the top 15 contribution amounts.

## 
##  100   25   50  250  500 1000  200   30   10 2300   20  150  300   15   35 
## 1798 1423 1415  718  454  341  300  272  199  182  150  146  144   98   91

Univariate Analysis

What is the structure of your dataset?

The cleaned data includes information on 9109 individual contributions. There are 19 columns which are described here.

What is/are the main feature(s) of interest in your dataset?

The main features of interest are the candidates, contributor occupations, contributor cities, contribution amounts, and contribution dates.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Of the 17 candidates who received contributions in this data, only a handful are probably worth investigating. For example, there are only 3 candidates (Obama, McCain, and Clinton) who received over 10% of the total number of contributions. Only 5 other candidates received more than 1% of the total number of contributions. This could be an argument to only focus on the data for Obama, McCain, and Clinton. Another case for focusing on the main candidates would be that many of the other candidates dropped out early in the primary cycle. Edwards, Romney, and Huckabee dropped out in January, February, and March 2008, respectively (Democratic Primary & Republican Primary). Republicans Brownback, Gilmore, and Tancredo who appear in the data withdrew before the primaries (Withdrew before primaries).
Other candidates in the data set have such a small amount of contributions to justify looking at the main candidates of Clinton, McCain, and Obama in later sections.

Did you create any new variables from existing variables in the dataset?

Yes, during the cleaning and processing of the file, I decided to place the occupations into different bins. There were too many variations of different occupations, and I wanted to be able to do some analysis on occupation and its effect on contributions.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

The distribution of contribution amounts is heavily skewed to the right. This makes sense since there are probably many people who give small amounts, and a few donors who give higher or the maximum amounts. I did perform some adjusting of the data prior to loading it into R. I used Python code to create the different occupation categories, so I could complete a more meaningful analysis. The original data has 1,265 different employer names and 858 different occupation names. I chose to create 12 occupation categories of what I thought were the most prevelant within the data (Other, Retired, Medical, Education, Executive, Unemployed, Homemaker, Legal, Student, Political, Self-Employed, and Religious).

Upon creating the boxplots for contribution amounts, I noticed something strange. There were several contribution amounts at $2300, and then the amounts jumped up to $4000 and $4600. I decided to investigate this further.

##                    contbr_nm               cand_nm contb_receipt_amt
## 257      HILDENBRAND, OLGA I         Edwards, John              4600
## 406  PETROPLUS, PARRY G. MR.   Giuliani, Rudolph W              4600
## 698        REED, CANDACE MS.   Giuliani, Rudolph W              4600
## 700   REED, JAMES W. MR. JR.   Giuliani, Rudolph W              4600
## 961       BOLEN, KENNETH MR. Thompson, Fred Dalton              4600
## 2723    MORGAN, CRAIG M. DR.        McCain, John S              4000
## 3515        FERRELL, VICKI L         Obama, Barack              4600
## 3518          FERRELL, JOE C         Obama, Barack              4600
## 4618         UMBERGER, SARAH         Obama, Barack              4600
## 4866     SHIMM, DAVID S. MR.        McCain, John S              4600
##                                             receipt_desc election_tp
## 257                REATTRIBUTION/REDESIGNATION REQUESTED       P2008
## 406                                    SEE REATTRIBUTION       P2008
## 698                                                            P2008
## 700                                                            P2008
## 961                                                            P2008
## 2723 REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)       P2008
## 3515                                                           P2008
## 3518                                                           P2008
## 4618                                                           P2008
## 4866                                   SEE REATTRIBUTION       P2008

It seems that many of these entries required reattribution or redesignation. I researched what this means at the Federal Election Commission Website and found that this means these contributions were over the $2300 dollar limit that was in place for individual contributions in 2008 (FEC Contribution Limits)

I decided to look at one of the names from the “weird” contributions to see what was going on.

##               contbr_nm       cand_nm contb_receipt_amt
## 256 HILDENBRAND, OLGA I Edwards, John              2300
## 257 HILDENBRAND, OLGA I Edwards, John              4600
## 258 HILDENBRAND, OLGA I Edwards, John              2300
##                              receipt_desc election_tp contb_receipt_dt
## 256 REATTRIBUTION/REDESIGNATION REQUESTED       P2008        30-Jun-07
## 257 REATTRIBUTION/REDESIGNATION REQUESTED       P2008        30-Jun-07
## 258                                             G2008        30-Jun-07

It looks like the $4600 contribution was reallocated into two separate $2300 contributions – one for the primarly cycle and one for the general cycle which is allowable since $2300 is the limit per election. I decided to remove these rows (with contribution amounts > $4000) from the data set.

Bivariate Plots Section

Since the majority of the variables are categorical, I used stacked bar charts as well as boxplots split by factor levels. I also created some heat maps to see any interesting patterns.

These first two bar charts show the relationship between candidate and city.
The first chart is too hard to read, so I limited the second chart to include the “main” candidates. One can use this type of chart to look at relationships such as how McCain received a larger share of contributions from Charleston than Morgantown. The third chart shows the same type of information but categorized by occupation category. The final bar chart shows the total contribution amounts by election type. It is obvious that more money was contributed for the primary elections.

## $cand_nm

## 
## $occupation_category

## 
## $election_tp

## $cand_nm

## 
## $occupation_category

## 
## $election_tp

Boxplots categorized by candidate, occupation category, and election type showing the relationship between these variables and contribution amounts were created. I decided to “zoom in” a bit to see the canidate boxplots a little better. It is interesting to see that approximately 75% of the Obama contribution amounts are lower than 50% of the Clinton and McCain contribution amounts. The striking visual is that Obama has many more dots overall.

Violin plots of the same data are shown for comparison purposes. One can see the plots get wider at common contribution amounts (see the election type violin plot for a good example of this at $500 and $1000).

I tried to make some heatmaps to show the relationships of contributions with occupation categories and cities. The information conveyed by these maps confirms that Obama received much larger counts of contributions.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the features(s) of interest vary with other features in the dataset?

I looked at many of the relationships between the categorical variables to see if anything interesting popped out of the data. I did notice that McCain received a larger share of the retired vote as compared to other occupation categories. He also received a larger share of contributions from Charleston.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

I did notice that the amount of contributions was much greater during the primaries than the general election. I think this can be attributed to two factors: 1) there are far more candidates during the primaries and 2) the electorate seems to be more concerned during the primaries because West Virginia is a “red” state, so many may not decide to contribute during the general election cycle as they think it may not make a difference.

What was the strongest relationship you found?

The strongest relationship I found was that no matter how the data is looked at, Obama tends to dominate the contributions across occupation and city.

Multivariate Plots Section

I used most of the same bivariate plots, but added faceting in order to add another variable for analysis since most of the variables are categorical.

The previous box plots are all analyzing contribution amounts split by the following:

It looks like Homemakers did not contribute as much to Clinton (speculation: maybe this has something to do with gender, but more research and data would be needed). The plot by candidate and date is intersting because I am surprised to see contributions to Clinton at these late dates corresponding to the general election.

The previous box plots are all analyzing contribution amounts split by the following:

Something to notice is to look at Legal contributions from Charleston. There appear to be more contributions than in the other top cities. Wheeling looks like it has fewer contributions from Retired.

The previous box plots are all analyzing contribution amounts split by the following:

Something to note is that some more research should be conducted to analyze the contributions being classified as primary contributions in September and October. These might need to be fixed within the data set. Looking quickly at the contributions made on these dates that are classified as primary contributions shows that at least one has a note that a redesignation was requested, so that may be what needs to happen with all of these contributions.

##                 contbr_nm                 cand_nm contb_receipt_amt
## 569   RIEDEL, PAUL B. MR.          McCain, John S               500
## 5909      BARNES, PHYLLIS Clinton, Hillary Rodham               250
## 5943  WICH, JOAN HOHLT MS Clinton, Hillary Rodham              2300
## 5948    BERTINUSON, JANET Clinton, Hillary Rodham               100
## 9087      PLESA, JOHN MR.          McCain, John S                25
## 9093 NELSON, ROMEY L. MR.          McCain, John S               100
##                 receipt_desc election_tp contb_receipt_dt
## 569  REDESIGNATION REQUESTED       P2008        30-Sep-08
## 5909                               P2008        23-Oct-08
## 5943                               P2008        16-Oct-08
## 5948                               P2008        23-Oct-08
## 9087                               P2008        16-Oct-08
## 9093                               P2008        16-Oct-08

The previous two plots are bar charts that are similar to the box plots. I only created charts that looked at contribution amounts related to candidate/election type and candidate/occupation category because I did not notice anything paticular in the various boxplots of the same variables.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The relationships I looked at involved looking at contribution amounts split by two of the categorical variables of interest. It was interesting to notice that McCain did not receive any contributions from the Unemployed or Political Categories. I was surprised to see that Clinton received some donations in October since she was not in the general election.

Were there any interesting or surprising interactions between features?

I was surprised to see that Clinton actually received more contributions during the primary election cycle. It was also surprising to see that Obama received a lot of contributions for the general cycle while McCain did not.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

No.


Final Plots and Summary

Plot One

Description One

The above bar chart shows how the contribution amounts differed between the primary election cycle and the general election cycle in 2008. I chose to show the top 5 candidates by contribution amount to see the drop off between the primary and general cycles. Hillary Clinton received the most money on the Democractic side, and she subsequently won the primary in West Virginia with 66.93% of the vote (West Virginia Democratic primary, 2008). What’s interesting to note is the significant advantage Obama had in contribution amounts for the general election cycle versus McCain. A hypothesis could be that the Clinton supporters who gave her huge amounts of money during the primaries shifted their money to Obama in the general election. It didn’t matter – McCain got 55.60% of the vote in West Virginia. (United States presidential election in West Virginia, 2008).

Plot Two

Description Two

This plot shows how each of the top three candidates fared with each of the occupation categories in terms of contribution money received. I was surprised that the “religious” category contributed so little as compared with other occupation categories (there are many churches in the city I reside in). I am also surprised in general at how much more the Democrats received than McCain received. Perhaps a plausible explanation is that West Virginia is a “red” state, so many Republicans do not feel the need to donate money to the Republican because he will will anyway. Obama dominated the major contribution categories of Medical, Other, and Retired. McCain broke even with Homemakers.

Plot Three

Description Three

This box plot shows a comparison of the distributions of contribution amounts by occupation category. The Other, Medical, Executive, Homemaker, Legal, and Self-Employed categories have the highest median contributions, but they are also the most spread out. The Retired, Education, Unemployed, Student, Political, and Religious categories have the lowest median contribution amounts. This makes sense since retired people usually have lower, fixed incomes, educators are notoriously complaining about low wages, unemployed people probably cannot afford to contribute high amounts, and students are known to always be broke. I chose to zoom in on the data to get a better look at and compare the majority of the data shown in the box plots.


Reflection

I selected information on election contributions because I was interested to see trends. After downloading the data, I quickly realized that many of the techniques I am familiar with (histograms, scatterplots) would not be so useful with this data set because there is really only one quantitative variable of interest (contribution amount). I had to really think about which categorical variables were the most important and then determine the best way to use faceting to create meaningful plots. I enjoyed working with this data set as I learned a lot of useful techniques and tools with ggplot2. I was successful in wrangling the occupation categories to create more meaningful analysis across occupations. Future analysis could focus on the specific dates within the data set. Time series date for each candidate could be plotted in order to see any trends over time. It would be interesting to see candidates who still receive contributions after they drop out of the primary, and how the amounts wane over time. I also discovered that wrangling and cleaning does not end when the analysis starts. I found several instances related to inconsistent data. In one case, I was able to fix the errors (the redisgnations). In the other, I need to complete further research to determine why some contributions late in the election cycle (September, October) are being classified as primary contributions.